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Abstract 

A de Bruijn covering code is a g-ary string S so that every q- 
ary string is at most R symbol changes from some n-word appearing 
consecutively in S. We introduce these codes and prove that they can 
have length close to the smallest possible covering code. The proof 
employs tools from field theory, probability, and linear algebra. We 
also prove a number of "spectral" results on de Bruijn covering codes. 
Included is a table of the best known bounds on the lengths of small 
binary de Bruijn covering codes, up to = 11 and n = 13, followed 
by several open questions in this area. 

1 Introduction 

A covering code C of radius R and dimension n on g symbols is a subset of 
the space [q]"' such that every string in [g]" differs from some element of C 
in at most R coordinates. It is common to require that R be as small as 
possible in the definition of a covering code, but, for the sake of notational 
convenience, we do not require this here. 

Question: Given n, R, and q, what is the smallest M = M{n, R, q) so that 
there exists an g-ary string S = (sq, . . . , Sm~i) with the property that the 
set of n-strings appearing as (sj, . . . , with indices taken modulo M, 

form a covering code of radius -R? Call such a string a {n, R,q)-de Bruijn 
covering code. 
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For example, 111000 is a (4, l,2)-de Bruijn covering code, because every 
binary 4-string is at most one bit change from an element of 

{1110, 1100, 1000, 0001, 0011, 0111}. 

On the alphabet {A,G,T,C}, the string 

AGATCGCAGATATGGTCTATG 

is a (4,2,4)-de Bruijn covering code, by Proposition IHl below. 

Clearly, M{n, 0, q) = g", since any de Bruijn covering code of radius is 
actually a de Bruijn cycle, and de Bruijn cycles of all orders over an arbitrary 
alphabet exist. (See, for example, 0.) If we fix i? > and q > 2, how does 
M{n, R, q) grow as n — ^ oo? 

It is easy to see that the growth is at least Q{q"-/n^), by the so-called 
"sphere-covering" bound. The set of strings which differ from any given 5* 
in at most R places has the same cardinality, X]fc=o ~ Therefore, 
if we are to cover all strings, we need at least 



codewords. On the other hand, it is well known that the size of the small- 
est g-ary covering code of radius R actually achieves this bound, up to a 
multiphcative constant which depends on R and q. (See jH] for the latest 
results on the size of this constant.) We may concatenate all the codewords 
of such a minimal code to yield a (n, R, q)-de Bruijn covering code of length 
0{q^/n^~^). This construction is clearly very wasteful, however. Can we do 
better, i.e., is the true order of magnitude of M{n, R, q) closer to the sphere- 
covering bound? In particular, can we say something nontrivial in the case 
of i? = 1? In fact, in Section IHl we prove the following. 

Theorem 1. For each n and q a prime power, there exists a {n, R,q)-de 
Bruijn covering code of length < (R + 1 + o(l))g" logn/((^) (g — 1)'^). 

Section El states several definitions and preliminary results we will need 
to prove this. The next section contains the proof itself, and Section |3] intro- 
duces a "spectral" perspective on de Bruijn covering codes that holds some 
independent interest. In Section El we present bounds for special values of n, 
R, and g, and include a table of bounds on M{n, R, 2) for 2 < n < 13 and 
1 < i? < 11. We end with several remarks and questions for further work in 
Section El 
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2 Preliminaries 



We fix a prime power q > 2 tliroughout this section and the next, and take 
our alphabet to be ¥q. (If q is not a prime power, we take the alphabet to 
be Z/gZ.) Write bji{v), for v an ra-string drawn from Fg, to denote the set 
of those strings differing from v in at most R coordinates. That is, bR{v) is 
y's radius R neighborhood in the Hamming metric. Also, write wt(t') for the 
Hamming weight of the vector v, the number of nonzero symbols it contains. 

Let a be a generator of the multiplicative group of the finite field Fgn . De- 
note by S the elementary basis for over F^. Given a basis B = {bi, . . . , 6„} 
of Fqn over F^ and an element 7 G F^n, write /^(t) for the element of F^ 
whose j*^ coordinate is the coefficient of bj in the ;B-representation of 7. Then, 
given a nonzero vector x G F^, define A(a,i3, x) to be the string whose j^^ 
coordinate (i.e., Aj(a,i5, x), 1 < j < — 1) is x.'^fBia-')- It is well known 
that, when B = {a^ : < j < ?i — 1} and wt(x) = 1, A(a, B, x) is a de Bruijn 
cycle of order n if we insert a at the beginning. (See, for example, [S].) 
We generalize this result as follows. Define A*[a,B,x.) to be the sequence 
A(q!,;B, x) with a zero inserted at the beginning of each occurrence of the 
string ... 1 . Then we have the following. 

Proposition 2. Fix a basis B of ¥^ over ¥g, a generator a G F^„, and a 
vector X G F^, and write $(j) for the vector 

{Aj{a, x), . . . , Aj+n-i{a, B, x))t g F^ 

The map \& which sends to and to $(j) is an isomorphism from the 
additive group ofWqn to F^. 

Proof. First, we show that \E' is linear. Write Cj for the elementary n- vector 
whose coordinates are all zero except for a 1 in the j^^ coordinate. We denote 
by g the matrix representing multiplication by 7 G F^^ in the B basis. It 
is easy to see that 

Aj(a,S,x) = xVi3(a^') 

and therefore that 

n— 1 n— 1 

*(7) = $^e,+ixT/^(a^7) = Y,e,+i^^Mi^Ml), (1) 

j=0 j=0 

which is obviously linear. 
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Now, suppose that \l/(7) = 0. We show that 7 = 0. Indeed, suppose that 
{ji7 • • • 7 in} are n distinct integers so that Aj-(a, i3, x) = for each i. If we 
denote by S the subspace of orthogonal to x, then we have a-'' G fj^^{S) 
for each i. However, /g is linear and has a trivial kernel, so all the a^'- lie in 
a subspace of of dimension n — 1 and are therefore linearly dependent. If 
we take ji = j + i for some j (i.e., \l/(7) = with 7 = a^), then we have that 
{a'''}l^J_^_^ is a dependent set. Since is nonsingular, this implies that 

{«*}"Jo^ is a dependent set. But then we have 

Cjo* = 

i=0 

for some nonzero (ci, . . . , c„), so a satisfies a polynomial identity of degree 
less than n. Since a generates F^n, this implies that {a-' j^^Q is a basis for 
Fqn for some d < n — 1, contradicting the fact that the dimension of F over 
¥q is n. We can therefore conclude that 7 = 0. □ 

Note that the map 7 1— > g is actually an isomorphism of fields. The 
image is a set of matrices which form a field, i.e., a matrix field. These 
objects have been studied extensively and thoroughly characterized when 
the matrices take their entries from a finite field (j2])- 

Corollary 3. A*{a,B,x.) is a de Bruijn cycle. 

Proof. By the above argument, A(a,i3, x) contains all nonzero n-strings. 
Clearly, the insertion of a causes the occurrence of the all-zeroes string 
without disrupting the presence of any other string. □ 

Our approach is to find an a G F^n, a basis i3, and a vector x so that the 
first K ~ logn/((^) (g — 1)-^) length n strings appearing in A(a, x) are 
(almost) a covering code of radius R. Specifically, we wish to show that, for 
only a small fraction of all v G Fg»i, 

{v + Bnm)n^{{a^}%,) = % 

where \1/ is the function defined in Proposition El Define = Jb ° 
Setting w = \&~^(i;), we may bound this quantity from above by asking the 
number of w so that 
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which, by (IH), is the same as saying that 



5^e,+ixW;^^gj V : wt(i;) = r\ H Mw + {a^fli) = 0- 

We must determine which matrices may appear in the form of the left-hand 
term. First, a result from linear algebra is needed. The following theorem 
appears in A non- derogatory matrix is one whose eigenspaces are all one- 
dimensional, and a matrix in rational canonical form is comprised of blocks 
of the form 





1 
1 


■■■ 









1 



1 an 



along the diagonal. 



Theorem 4. If A G K"^^^ is non- derogatory and in rational canonical form, 
then the following are equivalent: 

1. X commutes with A. 

2. The successive columns of X are v, Av, . . . , A^~^v for any v G . 

3. There exists a polynomial g G K[x] so that X = g{A). 
Furthermore, g = Yl^Zo ^i+i^"' ■ 



The matrices M^^b are non- derogatory when a is a generator of F^n, 
because their eigenvalues are all distinct, as the next result states. 

Proposition 5. A matrix M G F^^*^ is of the form M^^b for some generator 
a G ¥gn and basis B C F^n over ¥q if and only if its eigenvalues ( over the 
algebraic closure of¥g) are {«''^}"=o. 
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Proof. For a given a, fix tlie basis A = {a-' }"rQ. Clearly, if we write B for the 
matrix whose columns are B written in the basis A, then M^^b = B^^M^^^B. 
Therefore, a matrix M is one of the desired ones if and only if it has the 
same eigenvalues as the matrix M^^a- Let Pq(A) denote the characteristic 
polynomial of this matrix. By the Cayley-Hamilton Theorem (which applies 
to all commutative rings), PaiM^^A) = 0- However, the map a M^g is 
an isomorphism of fields for any basis B. Therefore, = 0. Since the 

Galois group of F^n over Fg is cyclic and generated by the Frobenius map 
X I— >• x'^, and the rest of the roots of pa are the Galois conjugates of a, the 
result follows. □ 

Furthermore, if we let 6q, denote the basis {a^}^lQ, then Mo,e^ is in 
rational canonical form. Its j^^ column is ej+i for 1 < j < n — 1 and its n}^ 
column is the vector of coefficients of the minimal polynomial of a (without 
the leading term). Using this fact, we can prove the following from Theorem 

m 

Lemma 6. Fix a generator a of ¥qn . Choose x G F^' \ {0"} randomly and 
uniformly, and choose a basis B randomly and uniformly. Then 



is distributed uniformly over all invertible matrices. 

Proof. Evidently, it suffices to show that D{B,x.) = X]j=o ^i+i^'''^Q,2? 
distributed uniformly. This matrix is one whose rows are x""", x^Mq^^, . . ., 
x^M^^^. Write A for the matrix Ma^Sa P ^r the matrix whose suc- 
cessive columns are the elements of Qa written in the B basis, and write y 
for P^x. Then we may also say that D(S, x) is the matrix whose rows are 
xT, xTPAP"\ . . ., xTPA""^P"\ which we may rewrite as P)(yl, PTx)P"^ 
Therefore, by Theorem |3] and the fact that A is non-derogatory and in ratio- 
nal canonical form, D{B,x) = Qy^A^P^^ with gy denoting the polynomial 
whose coefficients are the entries of y. Choosing x uniformly and randomly 
from the nonzero vectors yields the same distribution on y, independent of 
the choice of B. Since A is the image of a under the map a i— *• Ma^e^, 
and gy{(y) is uniformly distributed over Fgn \ {0} as y varies, we have gy{A) 
uniformly distributed over all matrices of the form M^^Bc fo^^ 7 ^ \ {0}- 
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Choosing B uniformly is the same as choosing uniformly, so we may con- 
clude that D{B,x.) = gy{AyP~^ is uniformly distributed over all invertible 
matrices. 



It remains to show that the set of all sums of k columns of a randomly, 
uniformly chosen invertible matrix are distributed more or less uniformly. 
Before proceeding, we need to state Suen's Inequality. We follow [l]. Let 
{Ai}i(zi be a set of events, and define a symmetric relation (i.e, a graph) ~ 
on /. We say that ~ is a superdependency graph if, whenever Ji, J2 C / have 
no edges between them, any Boolean combination of {Ajjjgjj is independent 
of any Boolean combination of {Ajjjgjj. Write M = Yliei^A-^i]- 

Theorem 7 (Suen's Inequality). Define 



The following is a routine application of this result. 

Proposition 8. For R G if M is chosen randomly and uniformly from 
GLn(¥g), then, for any set 5 C with \S\ = q''K/{{l){q - 1)«), 

Pr[{Mv : wt{v) = R} n S = dS] < e'^ic-^ + o(l)). 

where Cg = YlJLii^ " Q"'^) ^'^'^ = o{\/n). 

Proof. The probability that a randomly, uniformly chosen invertible matrix 
has all sums of k columns lying outside of a set S is given by 



□ 



3 The Main Result 



y{t,j) = iPr[A,AAi\+Pr[Ai]Pr[A,]) J] {1 - Pr[Ai])'\ 



l^i or l^j 



Then 




p = Pt[Mv e S when wt{v) = R\M e GL^iFg)] 
_ Fr[{Mv € S when wtjv) = R) A {M e GL„(Fg))] 
" Pr[M G GLn{¥g)] 
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< 



Pt[Mv e 5" when wt{v) = R] 



Pr[M e GL4¥,)] 

where we are choosing M randomly and uniformly from all matrices. It 
is well known that |GL„(Fg)| = g"'(cg + o(l)) with = llT=i(^ - ^"^)- 
Therefore, 

p < Ft[Mv e S when wt{v) = R]{c-^ + o(l)). 

Now, for a vector v of weight R, define Ajj to be the event that Mv G S, 
and let I{v) denote the set of indices at which v is nonzero. Then Pt[Mv G 
S when wt(w) = R] = Pr[A„A^]. The relation ~ u; iff I{v) n I{w) 
clearly defines a superdependency graph on these events. Furthermore, any 
pair Ay and Ay^, v w, are independent, since, if we fix the i^^ columns 

of M for 2 G I{v) n I{w), then j2iei{v)\i{w) and Eie/M\/{^) ^"^^ 
independent and uniformly distributed over F^. Therefore, 

i-pt[a:])-' 

2~D or Zr^W 
2 

I ^ 



y{v,w) = 2Fi[A,]Fi[Ay 



n 



< 2 



< 2 




K 



K 



-2((]^)(,-l)«-('-«)(,-l)«) 
(^1,)(-2R2+o(1)){q-1)« 



e-^U-i)(-2R'+°(i))/(S) 

/<(-2R3+o(l))/n 



Since there are Q ((^) - ("/)) (g - l)2«/2 = 0{n'^^-^) relations v ^ w, 
the quantity Ej,^^y(f,w^) tends to as n — > oo so long as -fC = o(y^). 
Therefore, Suen's Inequality implies that 



Pr 



A 

■wt{v)=R 



<{c-'+o{i)) n pr[A 

wt(v)=_R 



K 
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< + o{l))e-^ . 



□ 



Taking an initial segment of a random A(Q;,i3, x) and adding in all the 
"uncovered" codewords yields an (n, i?, q)-de Bruijn covering code. 

Theorem 2. For each n, there exists an (n, i?, q)-de Bruijn covering code of 
length <{R+l + o(l))g" logn/((;^) (g - 1)^). 

Proof. Fix any generator a G F^„. Choose the basis B = and the 

vector X G \ {0"} randomly and uniformly. Then define A(A') to be the 
string of the first q^K/[{^ (g — 1)^) +n symbols of A(a, B, x) (which we will 
call Ai(i^)), followed by a concatenated list (which we will call h.2{K)) of all 
strings in 

n \ U ^«(^) 

cec 

where C is the set of codewords appearing as n consecutive symbols (without 
wrap-around) in Ki{K). Then the resulting expected length of the string is 
given by 

E(|Ai(JO| + |A2(i^)|) = _ +n + nq-Y. ^ ^ = ^1 (2) 

Furthermore, the constructed string is an (n, i?, g)-de Bruijn covering code. 
By the discussion preceding Theorem HI Pr[6/j(?;) fl C = 0] is bounded above 
by 



Pr 



5^e,+ixTM;^^^ ) w : wt(«;) = R) n fs{v + {«nf=i) 

vi=0 



The matrix in the left-hand term is uniformly distributed over all invertible 
matrices, by Lemma El Therefore, by Proposition |H1 

Vi[hR{v) n C = 0] < e-^^(c-i + oil)). 

Plugging this and K = [R + l) logn into (0) yields 

E(|Ai(ir)| + \X,{K)\) < (i? + 1 + o(l)), 

so a (n, i?, g)-de Bruijn covering code of the desired length exists. □ 
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4 A Spectral Perspective 



In this section, we describe a "spectral" test to see whether a given string is a 
de Bruijn covering code, and apply it to a probabilistic construction. Define 
gn{x) = e^'^*^/^, as is standard notation. 

Proposition 3. Let S = {S{0), . . . , S{M — 1)) be a q-ary string, for any 
q > 1. Then S is a de Bruijn covering code of radius R and dimension n if 
and only if the quantity 



q"-lM-l 

uj=0 j=0 v:wt{v)<R m=0 



n-1 

mioj — 



'^{S{i+j)+Vi mod g)g*) 



(3) 



is positive, where v varies over the set of q-ary sequences (fo? • • • ? ^n-i) o,nd 
the index of S is written modulo M. Otherwise, this expression is zero. 

Proof. In what follows, all parameters vary over the ranges indicated in the 
statement above. Note that 

egn(m(u; — u')) 

m 

is positive ii uj = uo' mod g", and zero otherwise. If we represent a g-ary 
word as an integer base g, then the j*'^ word appearing in S is S{i + j)q\ 
and, if wt(f) < R, this quantity plus '^j^Vkq'' (digits added independently 
modulo q) is the j^^ word with each symbol altered in at most R coordinates. 
Therefore, the quantity 

^^egn(m(^ - ^{S{i+j)+Vi mod q)q')) 

V m i 

is positive if and only if the word S{i + j) is at most a distance R from the 
word which is uj written base q. Taking the sum over j and then the product 
over uj, we get that Q is positive if and only if S is an (n, R, g)-de Bruijn 
covering code, and is zero otherwise. □ 

Consider the expected value of the above expression when we take a 
randomly, uniformly chosen binary string S G {0, 1}^^. Clearly, an {n,R,2)- 
de Bruijn covering code of length M exists if and only if this expected value 
is positive, since Q is always nonnegative. 
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Theorem 4. An (n, R, 2) -de Bruijn covering code of length M exists if and 
only if 



E 



62" 



uj=Q 



M-1 



n ""Yi 



cosl TT > m„,('l - 2v^^i)2 



> 0, 



«=0 



where i and u range over all pairs so that < i < n — 1, < u < 2"^ — 1, 
and i + juj = I mod M, and the ranges of the other parameters are given by 

je{0,...,M-ir 

me {0,...,2" - 

ve{v e {0,1}" : wt{v) < RY\ 

Proof. First, rewrite by moving the product inside and collecting terms 
involving the same digits of S: 



j V m 



2"-l 
.1^=0 



J /=0 



+ v^^i mod 2)2' 



(4) 



If X is a random variable with two equally probable values A and 5, then 
E[eAf (X)] = eM{{A + B)/2) cos(7r(A - B)/M). Taking the expected value of 
(j^ therefore gives 



E 



J,v,m 



■2"-l 

E 



M-l 



1=0 



- y^m,„2 



i-l 



cosl vr^m^/l - 2v,^,_j)2' 



since the digits of S are independent. We may simplify this expression to 



E 



J,v,m 



62" 



■2"-l 



^m^(cu-(2"-l)/2) 



1^=0 



A/-1 



Yl cos 7r^m^(l - 2v^^i)T 



1=0 



□ 



Unfortunately, this result does not yield a practical means of calculating 
M{n, R,2), due to the large number of terms. Furthermore, it is unlikely 
that much cancellation can be identified in this sum, given the NP-hardness 
of determining a code's covering radius |3]. It may be possible, however, to 
exploit approximation algorithms for vertex-coverings to find a much simpler 
sum which yields a reasonable bound. 

We also offer the following, in the spirit of the above results. 
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Proposition 5. Let S = {S{0), . . . , S{M — 1)) be a q-ary string, for any 
q > 1, and denote by X the union of the radius R balls about each codeword 
appearing as an n-string in S . Then the number of points of [g]" not covered 
by X is at most 



^^{S{i + i) + vt^i mod 



^ ^ I " ^2 ^2 

a;=0 k=0 ' \ j=0 wt{v)<R m=0 



n-1 

m{ljj — 

i=0 



where v varies over the set of q-ary sequences (fo, . . . , fn-i) and the index of 
S is written modulo M . 



Proof. As above, the quantity 

j=0 wt{v)<R m=0 



Cgn 



n-1 

m{uj — 

i=0 



^{S{i + j) +VtA mod q)q') 



counts the number of times that u is covered. Therefore e '?"'^('^) is at 
least the number of uncovered points. □ 

One might conjecture that a sufficiently long sequence S whose Fourier 
coefficients S{k) are small, for k 0, covers all but a small fraction of 
Hamming space. To avoid trivial cases, we must restrict our attention to 
sequences with approximately the same number of each symbol. However, 
this statement is false even in the binary illustrated by the following 

simple example. 

Define S = (5(0), . . . , S{M - 1)), M even, by (5(2j), S{2j + 1)) = (0, 1) 
with probability 1/2 and (1,0) with probability 1/2, each pair chosen inde- 
pendently. Clearly, 5* has the same number of I's as O's. The k^^ Fourier 
coefficient, k 0, has square magnitude 

M-l 



\S{k)\'= eM{k{u-v))S{u)S{v). 



u,v=0 



The values of S{u) and S{v) are independent if |m — f | > 1, so the expected 
value of the above expression is 



M-l 



n\m\'] = J2 eM{k{u-v))E[Siu)S{v)] 



u,v=0 
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M-l 



5:^ E 

u=0 |«-j;|>l 



eM{k{u - v)) 



J2 eM{k{u-v))E[S{u)S{v)] 

\u~v\=l 



<- 



M 



E 

u,v=0 
M-l 

E 

«=0 



eA.i{k{u - v)) 



^ \ ^ + 2M 



\u—v\<l 



euiku) 



3M 



2M 



4 

15M 
4 



Any n-word appearing in S has weight either [n/2j or [n/2'1. Therefore, 
there exists a sequence S" of length M with Fourier coefficients S{k) -C v^M 
so that, for any fixed i?, the number of codewords at most a distance R from 
the resulting code is an 0(n~^/^) fraction of the total. 

It would be interesting to know whether the characteristic function of 
quadratic residues mod p are a (near?) de Bruijn covering code whenever 
p = Q{2'^/n^). Other possibilities for random-like constructions include the 
image of [0, + l)/2] under the map s i— > s'^ with {k,p — 1) = 1, and 
the image of [0, {p — l)/2] under the map s h-^ r'', for some primitive root 
r. Unfortunately, because of the above example, the Fourier coefficients of 
these sets (which are known to be small) tell us nothing about how well they 
cover Hamming space. 



5 Numerical Bounds 

It is of interest to know M{n, R, q) for small values of its parameters - in 
particular, for q = 2, i.e., the binary case. First, we collect a few simple 
observations. 

1. M{n,R,q) < M{n + k,R — l,q + m) for any k, 1,771 > 0. If a de 
Bruijn covering code C exists for parameters {n + k, R — l,q + m), then 
certainly decreasing the dimension, increasing the radius, or decreasing 
the number of symbols will leave C covering everything. (In the case 
of decreasing the number of symbols, we can replace all occurrences of 
the excluded symbols to "0". It is easy to check that this operation 
can only decrease distances from n-strings to the code.) 

2. M(n, 0, q) = g", as noted in the introduction. 
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3. M(n, R,q) = 1 if R>n, by taking the string "0". 

4. M(n,i?,2) = 2 if [n/2j < R < n, by taking the string "01". The 
two resulting codewords are complements in the n-cube, and therefore 
every string is within [n/2j of one of them. Furthermore, it is clear 
that at least 2 codewords are necessary. 

5. M{n,R,q) > Kg{n,R), the smallest number of codewords in a g-ary 
covering code of dimension n and radius R. 

6. M{n, R,q) ^ M if min{ |n mod M\,\{-n) mod M\} < n - 2R - 1, 
where \x mod y\ means the least nonnegative representative of x mod- 
ulo y. Indeed, if a (n, R, q)-de Bruijn covering code 5* = (sq, . . . ,sm) 
exists, then every string of n consecutive symbols has weight 

—J Wt(^) + wt(Si, . . . , 

for some i, where the indices are taken modulo M and A = \n mod M\. 
Similarly, each such string has weight 

+ 1 j Wt(S') - wt(Si, . . . , Si+B~l) 

for some i, where B = \{—n) mod M\. Therefore, any two codewords 
appearing in S can differ by at most C = min{A, B} in weight. If 
C <n — 2R — 1, then either the string 0" or the string 1" is at least a 
distance R + 1 from any codeword. 

7. Every (n, R, 2)-de Bruijn covering code has a run of \n/{R + 1)\ con- 
secutive O's and a run of [n/{R + 1)J consecutive I's. Suppose a code 
did not contain 0^ with k = \n/{R + 1)J. Then every element of the 
code has weight at least \n/k\ > i? + 1, so the word O"' is not covered, 
a contradiction. An identical argument applies to the case of a run of 
I's. 

8. If there exists an (n, i?, g)-de Bruijn covering code of length M, then 
there exists one of length M + ri + A; — 1 for all A; > 0. If 5 is the shorter 
string, append a copy of the first (n — 1) symbols and k arbitrary g-ary 
symbols to the end. 
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9. If there exists an (n, i?, g)-de Bruijn covering code of length M{n, R, q) 
that somewhere contains the string a"~^, then there exists an (n, R, q)- 
de Bruijn covering code of all lengths longer than M(?7,, R, q). We may 
simply insert more copies of a into the string to generate longer ones. 

10. There are at least M (n, i?, q) [n, R, q)-de Bruijn covering codes of length 
M{n, R, q). Since M{n, R, q) is minimal, no such string has period less 
than M(n, R, q), since otherwise we could truncate after a single period 
and achieve a smaller de Briujn covering code with the same parame- 
ters. Therefore, all cyclic translations of any de Bruijn covering code - 
which are each themselves de Bruijn covering codes - are distinct. 

Below, we include a table of the best known bounds on the sizes of bi- 
nary de Bruijn covering codes with various parameters. A single number in 
an entry indicates that the exact value of M{n, R, 2) is known; two numbers 
indicate an upper and lower bound. Bounds were achieved using the obser- 
vations above, the table in ^U], as well as software that searched the string 
space randomly (for upper bounds), and one which searched it exhaustively 
(for lower bounds). A few hundred hours of computing time on a 1.8 GHz 
Intel-based PC were used to construct this table. 



6 Remarks and Further Questions 

Statement |H1 in the previous section highlights a frustrating property of de 
Bruijn covering codes that stands in stark contrast to ordinary covering codes: 
it is possible for one to exist of length M but for none to exist of length M+1. 
For example, a (10,4,2) code exists of lengths 4 ("1100"), 6 ("011100"), 8 
("00111100"), and 12 ("000011111100"), but none of lengths 5, 7, 9, 10, or 
11 exist. However, by the above, a (10,4,2) code of all lengths at least 13 
must exist. Therefore, in addition to finding the smallest possible de Bruijn 
covering code, we would like to know when de Bruijn covering codes with 
lengths between M{n, R, q) and M(n, R, q) + n — 1 exist. 

Another difference between de Bruijn covering codes and ordinary ones is 
that there is no easy way to use known efficient codes to build efficient codes 
for larger n, smaller i?, or larger q. It would be desirable to define a "prod- 
uct" analogous to direct sums for ordinary covering codes. Unfortunately, 
interlacing, the obvious candidate for such a product, appears to be very 
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Table 1: Best known bounds for M{n, R, 2) 



inefficient. We offer a different, though related construction which allows us 
to increase q when the desired number of symbols is a perfect power of the 
number of symbols in the original code. 

Proposition 6. // a* = 6 for any positive integers a, b, and s, then for all 
n,R>0, 

M{n, R, h) < 

Proof. Let t = M{sn, R,a) and m = s'^\{t + sn)/s] — s, and let C = 
(co, . . . , Cf_i) be a minimum-length {sn, R, a)-de Bruijn covering code. We 
construct an (n, R, a'*)-de Bruijn covering code C = (cq, . . . , c^_i) of length 
m. Choose some bijection a between (Z/aZ)'' and Z/a*Z, and define 

Cj mod (m/s)|; ■ ■ ■ i — 1 mod (m./s)|) 



M{sn, R, a) + sn 
s 
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with indices on the left hand side taken modulo m and indices on the right 
hand side taken modulo t. Evidently, C is well defined, since s\m. Now, 
suppose X = {xo, . . . ,Xn-i) is an n-string over a* symbols. We claim that 
there is some codeword in the set of consecutive n-strings of C which is 
within R symbols of x. 

Indeed, let x'j = a~^{xj) for < j < n and define X' = x'q- ■ ■ x'g^_i, a 
string of length sn. Then some string X" which differs from X' in at most 
R symbols occurs somewhere in C, say, beginning at coordinate k. X" must 
occur at least s times in C", at coordinates k + jm/s for < j < s. (If X" 
"wraps around" in C, the extra > sn — 1 symbols at the end of each block of 
length m/s guarantee X" appears in C.) Furthermore, since {m/s,s) = 1, 
the numbers k + jm/s, < j < s, represent all residue classes modulo s, so 
there is some r so that k + rm/s = mod s. Then the string 

^ ('^fc+rm/s' • • • ; '^k+rm/ s+s—l) ■ ■ ■ ^ (c^+rm/s+{n— • • • ; ^k+rm/ s+ns—l) 

appears in C and at most R of its coordinates differ from those of X. □ 

The most obvious question arising from the subject of the present work 
is the issue of whether the bound stated in Theorem is best possible, i.e., 
whether the log factor can be dropped or the result can be extended to g's 
which are not prime powers. We also would like to explain why so many of 
the entries in Table 1 are even. 
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